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Analytic models of two computer generated time series (Logistic map and Rossler system) and 
two real time series (ion saturation current in Aditya Tokamak plasma and NASDAQ composite 
index) are constructed using Genetic Programming (GP) framework. In each case, the optimal map 
that results from fitting part of the data set also provides a very good description of rest of the 
data. Predictions made using the map iteratively range from being very good to fair. 
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I. INTRODUCTION 

The problem of constructing models of complex dy- 
namic systems from its time series is of great interest 
in various fields of science and economics. This is impor- 
tant both for insights they may provide into the dynamics 
and for making predictions. Since the dynamics of such 
systems is expected to be non linear in nature, different 
methods have been suggested for constructing the mod- 
els pj, 0- These include local linear models, radial basis 
function approach, artificial neural networks (ANN), ge- 
netic algorithm (GA) and genetic programming (GP). In 
this note, we work with the Genetic Programming frame- 
work similar to that used by Szpiro and extend it in 
some ways. A brief description of GP is given in Sec. II. 

The approach is first tested with two deterministic 
chaotic time series (Logistic map and Rossler system) 
J5J . We obtain an excellent representation of the data in 
terms of a non-linear regressive map. In order to further 
validate the model we also make dynamic predictions. In 
our view, this is a crucial test of the model. For the Lo- 
gistic map we find impressive results but for the Rossler 
map the results are not so good (Sec. III). 

We next consider two measured or observed time se- 
ries. The first one is the time series of measurement 
of ion saturation current in Aditya Tokamak plasma Q 
and the second one is the finanical NASDAQ composite 
index. These series are non-stationary and exhibit fluc- 
tuations that are statistical in nature. In view of this we 
first separate the fluctuations from the underlying mean 
behaviour of the series using wavelet transforms [|| A 
brief account of wavelets is given in Sec. IV A. Fur- 
ther, as in statistical physics (e.g. Langevian equation 
for Brownian particle) we assume mean dynamics to be 
deterministic and fluctuations to be stochastic. In order 
to model the mean dynamics in the GP framework we 
first construct an appropriate embedding in the recon- 
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structed phase space for each time series. This is briefly 
summarized in Sec. IV B. The properties of the em- 
bedding together with genetic programming then gives 
us an analytical expression for the map. Interestingly, 
the optimal non-linear model that is generated in this 
manner is of the Pade form Q- They have many in- 
teresting mathematical properties, e.g. same or better 
convergence properties as compared to power series and 
can model functions with singularities. We find that the 
maps describe the dynamics quite well and give good one 
step predictions. However, when iterated, the prediction 
deteriorates rapidly. Our results for the Aditya Toka- 
mak data and those for the NASDAQ composite index 
are presented in Sec. IV C and D respectively. Sec. V 
contains some concluding remarks. 



II. GENETIC PROGRAMMING 

Under the umbrella of Evolutionary Algorithms (EA) 
0- 01 E3, various approaches like Genetic Algorithm 
(GA), Genetic Programming (GP), Evolution Strategies 
(ES) etc have been framed that are aimed at solving com- 
plex search and optimization problems. The approaches 
differ from one another by the implementation details of 
genetic structures and the use of various genetic opera- 
tors. In the present note we have used Genetic Program- 
ming (GP) framework that uses a nonlinear structure 
for chromosomes (details given in Appendix) that rep- 
resent candidate solutions, as against the more popular 
approach of Genetic Algorithm (GA) that uses linear 
structures of chromosomes. 

Genetic Programming (GP) uses an iterative compu- 
tation to progressively get better and better candidate 
solutions. We first initialize the population P(timc t=0) 
randomly with chromosomes generated as per a template 
structure, such as 

((A ® B) ® (C®D)) 

where A, B, C, D are either time lagged variables (to 
be described later) or real numbers and £g> is one of the 
arithmatic operators +, -, x or The population P(t) 
is then iterated by following the steps given below: 
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1. Evaluate chromosomes in P(t) using an objective 
function that is a measure of fitness. Sort the pop- 
ulation P(t) according to fitness values. 

2. Preserve a portion of good chromosomes of P(t) by 
copying them on another population P(t+1) using 
the copy operator. This would assure that the best 
chromosomes found so far are not lost due to the 
application of genetic operations on P(t). This fea- 
ture is known as elitism. 

3. Select pairs of chromosomes from the remaining 
portion of P(t) (typically using a roulette wheel se- 
lection criteria that gives more preference to chro- 
mosomes having higher fitness values) and recom- 
bine the pairs (i.e. parents) stochastically to gen- 
erate offsprings and put them in P(t+1). 

4. Mutate offsprings in P(t+1) stochastically. 

5. Steps 3 and 4 give rise to the next generation pop- 
ulation P(t+1). 

6. Replace newly generated population P(t+1) with 
P(t) 

7. Advance time t to t + 1. 

8. Verify the termination criteria (i.e. whether gen- 
eration number t has crossed a preassigned upper 
limit or whether convergence for fitness values of 
top chromosomes have been achieved). 

Once satisfactory solution(s) have been found, the itera- 
tion is stopped. 

We have followed the general outline of Genetic Pro- 
gramming (GP) as in Szpiro to fit a given data set of 
time series. For the time series considered presently, we 
use 500 points for fitting the data and then carry out an 
out-of-sample prediction. The prediction is done using a 
one-step approach and a dynamic iterative approach, to 
be described later. 

For all the time series considered presently, we assume 
the map equation to have the form 



X, 



f(X t - T , X t -2r, X t -3 T , ...Xt-dr) 



(1) 



where f represents a function involving time series val- 
ues X t in the immediate past, arithmatic operators (+, 
-, -k and -h) and numbers bound between -10 and 10 with 
a precision of 1 digit; d represents number of previous 
time series values that may appear in the function and r 
represents a time delay (to be described later in Sec. IV 
B). 

The sum of squared errors, 
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is minimized, where N represents number of X t values 
[Eq. JTJ] that are fitted during the GP optimization. 



For a given chromosome, the lower the above sum of 
squared errors, the fitter is the chromosome. The fitness 
measure derived from A 2 is defined as: 
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is the average of all X t (Eq. ^) to be 



where Xf 1 
fitted. 

As described in the Genetic programming is dis- 
couraged to overfit by generating longer strings of chro- 
mosomes. This is achieved by modifying the fitness mea- 
sure as follows, 



l-(l-i? 2 ) 



N - 1 
N-k 



(4) 



where N is the number of equations to be fitted in the 
training set and k is the total number of time lagged vari- 
ables of the form A t _ T , X t -2 T , ■■■ etc (including repeti- 
tions) occurring in the given chromosome. This modified 
fitness measure prefers a parsimonious model. For R 2 
close to 0, r can be negative. 



III. GP MODEL FOR TIME SERIES OF 
KNOWN SYSTEMS 

We consider the time series of 2 known systems, namely 
1) Logistic map and 2) Rossler system. 



A. Time Series of Logistic Map 

The Logistic map is defined by the equation 

X n+1 = rX n (l - X n ) (5) 

We have chosen the control parameter r as 3.891 so 
that Eq. (J5J) generates a chaotic time series. Choosing 
Ao=0.1 and bypassing initial 2000 transient points we 
generate the time series. 

We then use GP to fit the data set of 500 points to 
get the best possible map function using d=l and r=l. 
The fit obtained is very good giving A 2 =3.523*10~ 10 and 
modified fitness r=1.0. The map equation for X t as a 
function of time lagged variables as obtained by GP is as 
follows: 

x t = xf/xf 

xf = -3.9(X tl + 2.9)X tl (X tl - 0.0521)(X tl - 0.0693)(X tl - 0.976) 
(Xt! - l-0) 2 (X tl - 7.46)(xfj + 2.25X tl + 1.46) 

(x| x + 1.03X tl + 1.63)(X^ 1 - 1.41X tl + l.S2)(X^ 1 - 3.49X tl 4- 3.30) 
X t D = (X tl +2.9)(X tl - 0.0522)(X tl - 0.0693)(X tl - 0.976) 

(X tl - 1.0)(X tl - 7.46)(X^ 1 + 2.26X tl + 1.4S)(xf x + 1.03X tl + 1.63) 
(Xjj - 1.41X tl + 1.83)(X^ 1 - 3.49X tl + 3.3) (6) 
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FIG. 1: Out-of-sample one-step prediction of 200 points for 
Logistic time series beginning at data point 6001. 
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FIG. 2: Out of sample prediction for Logistic map using GP 
solution. 



TABLE I: NMSE (Eq. [7J for Logistic time series using one- 
step prediction of 200 points starting at different data points. 



Starting data point 


NMSE 


1001 


8.18436e-12 


2001 


1.01925e-ll 


4001 


8.78014e-12 


6001 


9.97224e-12 


8001 


8.80229e-12 



We use the notation Xt m =Xt~ m *T and show double 
precision numbers to only 3 significant digits for the sake 
of simplicity. It can be seen that many of the factors in 
the numerator and the denominator approximately can- 
cel (apart from higher precision effect) and the resulting 
simplified form of the map is X t ~ 3.9AT t _ T (l — X t - r ) 
that is remarkably close to the actual map equation. 

The normalized mean square error (NMSE), 
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NMSE = 



N variance of N data points 



(7) 



is used as an index for the goodness of fit. For the 
above GP fit for the dataset of 500 points of Logistic 
map, we get NMSE=9.84294*10" 12 . 

The map is then used to make an out-of-sample one- 
step prediction (in which given time lagged values are 
successively used to predict the next dataset value) start- 
ing at different regions in the time series. 

Fig. ^shows out-of-sample one-step prediction starting 
at data point 6001. Table |T| shows NMSE values for 200 
point one-step prediction starting at different data points 
outside the fitted dataset. Note that the predictions are 
in almost perfect agreement with the data. 

As for any other model, the real test of the GP so- 
lution lies in making a dynamic prediction, in which it 
is assumed that data beyond the fitted dataset are not 



available and hence calculated values are progressively 
used beyond the fitted dataset. Fig. |2 shows the dy- 
namic prediction for 35 points beyond the dataset of 500 
fitted points. 

It is seen that the dynamic prediction holds good for 
around 28 steps (the % error at point no. 28 being 
1.48*10~ 4 ). After that the prediction deteriorates due 
to the chaotic nature of the time series. As is known, 
Lyapunov exponents provide a quantitative measure of 
the sensitivity of the initial condition in a given chaotic 
system (i.e. the measure of the divergence of neighboring 
trajetories exponentially in time). It is therefore useful 
to calculate the Lyapunov exponent for the Logistic map. 
We have used the programs given in Kantz et al for 
the calculation of Lyapunov exponents. The Lyapunov 
exponent for the Logistic time series considered is 0.471. 
It may be noted that this method for the calculation of 
Lyapunov exponent has a possible element of small er- 
ror due to selection of linear part from multiple curves 
for finding its slope. Using this Lyapunov exponent, an 
initial error ofl.61*10~ 7 in the first step of dynamic pre- 
diction is expected to grow to around 0.1 (and rapidly 
to higher values there after) around 28 points. Thus in 
this case we understand the inherent limitation of itera- 
tive dynamic prediction due to chaotic nature of the time 
scries. 



B. Time Series of Rossler system 

Next we consider the time series generated from dis- 
cretized Rossler equations 

X t+ s = X t - [(Y t + Z t )]6 
Y t+ 8 = Y t + [{X t + aY t )]6 
Z t+S = Z t + [b + X t Z t -cZ t }6. (8) 

The parameters in Rossler equations are selected as in 
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Q and are: a=0.2, b=0.2 and c—5.7. The time series 
for X t , Y t and Z t are generated with initial conditions as 
x =-l, yo=0 and zq=0 and 5=0.02 and using every 50 th 
point of the series generated. 

A GP fit is then made on the dataset of 500 points 
with values d=8 and r=l used by Szipro [3j. The map 
equation generated from the fit using GP gives fitness 
value as 0.9874 and the map equation is: 



x" = (0.111X t5 4- 1.257)(X tl - X t2 - 0.14(63.83 - 15.1X t4 

+ (X tl + 2X t4 + 10.175)(0.149(X t4 + 10)X t5 (X tl - X t2 - 1.0) + 1.061) 
-0139X t6 (X tl + 2X t4 + 10.175)(3X t2 + 0.175) + X t2 + X t7 

0.139(210.69 - 7.525X £1 - 7 y 56T - 7.9X t4 ) 

+ - ) 

481.08 - X t6 + X t5 4- 59.04X t4 + X tl 

X° = (X tl + 7.2X t4 - X t5 - X t6 + 57). (9) 



We use the notation X tm =X t - m *T and show double 
precision numbers to only 3 significant digits for the sake 
of simplicity. 

Fig. 13 shows comparision between given and GP cal- 
culated values of 500 data points. The thick line close to 
0.0 marks the difference between the given and calculated 
values and indicates that the fit is quite good. 

Next we carry out an out-of-samplc prediction beyond 
the fitted dataset of 500 points. 

Fig. 21 (a) shows 1-step prediction for 100 points, ft 
can be seen that the 1-step prediction is quite good. The 
NMSE (Eq. Ql value for the one-step fit is 0.0118. Next a 
dynamic prediction is made using the GP solution. Fig. 
21 (b) shows comparison between actual and predicted 
values. As can be seen from the figure, the prediction 
is quite good for around 12 points beginning from data 
point 501. The Lyapunov exponent for the time series 
considered is 0.22. Using this value of Lyapunov ex- 
ponent, it is estimated that the initial error of 0.08438 
would grow to 1.1825 on the 12 th step and to higher val- 
ues rapidly thereafter. This again is in close agreement 
to the trend observed in Fig. 0] (b) showing the dynamic 
prediction by GP solution. 

The NMSE value for dynamic prediction of 12 points 
beyond the fitted dataset is 0.048. 



IV. ANALYSIS OF TIME SERIES OF REAL 
SYSTEMS 

Next we consider the time series of 2 real systems, 
namely 1) ion saturation current in Aditya Tokamak 
plasma and 2) financial NASDAQ composite index. Since 
these scries have stochastic fluctuations, it is necessary 
to filter them to obtain the trend. The trend is then 
modelled using GP approach. It is therefore required to 
filter these series using an appropriate method. 



A. Smoothening the Time Series Data Set for real 
systems 



A number of methods exist in the literature and have 
been used for separating fluctuations from trend in a time 
series. In this context, it is important to point out that 
most time series of real systems with complex dynam- 
ics are non-stationary in nature. Consequently in such 
cases, we need to employ a suitable method to separate 
the fluctuations from the trend. We use discrete wavelet 
transforms (DWT) 6] for this purpose. Such an approach 
has recently been proposed by Manimaran et al |l2| and 
Ahalpara et al [13|]. The basic reason for our choice of 
DWT is related to their nice mathematical properties. In 
the present context, it is sufficient to note that (i) DWT 
provides a complete orthonormal basis to decompose a 
non-stationary signal and (ii) wavelet functions have a 
finite number of moments that are zero. In our work we 
have used length-4 Daubechies-4 (Db-4) wavelet trans- 
form. It is one of the simplest and smallest (even) length 
wavelet transform that is smooth. We next describe in 
brief our wavelet based procedure. 

Given a time series composed of n points, namely X%, 
i=l, 2, ... n, we first carry out a forward Db-4 wavelet 
transformation 6] that gives n wavelet coefficients. Of 
these, half (n/2) are low pass coefficients that describe 
the average behavior locally and the other half (n/2) are 
high pass coefficients corresponding to local fluctuations. 
In order to obtain a smooth time series, the high pass co- 
efficients are set to zero and then an inverse Db-4 trans- 
formation is carried out. This results in smoothening of 
the data set (see Manimaran et al 0] and Ahalpara et 
al H3)- While using the Db-4 transform it has been ob- 
served that due to fixed boundary of the data set, rapid 
fluctuations are observed towards the beginning and the 
end of the smoothened data set. In order to remove this 
spurious effect, we do a padding of the data set by adding 
constant valued n/2 data points at the beginning and 
n/2 data points at the end of the time series. The con- 
stant value matches with the value of the first and the 
last data point respectively. The forward Db-4 transfor- 
mation is then applied on the data set having 2n data 
points. Having smoothened the data set by an inverse 
Db-4 transformation, the padded data sets (containing 
the spurious effect) are removed thereby getting an im- 
proved smoothened time series of the n data set points. 
We thus obtain a level one time series of trends in which 
fluctuations at the smallest scale have been filtered out 
and the trend extracted after applying Db-4 transform. 
One can repeat the entire process on level one series to 
filter out fluctuations at the next higher time scale to 
get a level two smoothened series and so on. It is worth 
emphasizing that the low-pass wavelet coefficients give 
a representation in the transformed space of the smooth 
determistic part of the time series. 
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FIG. 3: GP fit of 500 points for Rossler time series, (a) and (b) show initial and last 250 points of the 500 point time series 
respectively. 
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FIG. 4: Out of sample prediction beyond the fitted dataset of 500 points for Rossler time series, (a) and (b) show 1-step 
prediction (for 100 points) and dynamic prediction (for 15 points) respectively. 



B. State Space Reconstruction 

We use the standard time delay embedding approach 
using the smoothened time series as a means of recon- 
structing the vector space that is equivalent to the orig- 
inal state space of the system. In order to carry out the 
embedding we need to determine 2 parameters, namely 
time delay r and embedding dimension d. 

Average mutual information analysis is used to obtain 
the time delay r and the number of false nearest neigh- 
bors analysis is used to obtain the embedding dimension 
d. These two methods are described by Abarbanel et al 

We use the prescription I(r)/I(0) « 0.2 suggested by 
Abarbanel et al jlj] for choosing the time delay r. Here 



I(t) represents average mutual information as a function 
of time lag r. 

The dimension d is fixed by choosing the smallest di- 
mension for which number of false neighbors become zero. 
Further we require that the number of false neighbors 
consistently remains zero thereafter for higher dimen- 
sions. We have used this criterion for all the time series 
considered in present analysis. 



C. Aditya Tokamak data 

We first consider the experimental time series of ion 
saturation current in Aditya Tokamak plasma @- This 
series is first smoothened using Db-4 transformation with 
level=l, 2 and 3. 
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TABLE II: Time lag r and dimension d obtained for Aditya 
time series using average mutual information and % of false 
neighbors analysis. 



Level 


Time lag r 


Dimension d 


1 


1 


5 


2 


1 


5 


3 


1 


8 



TABLE III: The fitness parameters for Db-4 smoothened level 
1, 2 and 3 Aditya time series obtained using GP fit on datasets 



of 500 points each 






A 2 


Fitness 


Aditya series (Db-4 level= 


= 1) 


3.3685 


0.9724 


Aditya series (Db-4 level= 


=2) 


2.6329 


0.9776 


Aditya series (Db-4 level= 


=3) 


1.8594 


0.9834 



The time lag r and dimension d of the embedded vec- 
I 



tors are found by average mutual information analysis 
and number of false neighbors analysis. These are shown 
in Table CD 

A GP fit is made on the datasets of Aditya time series 
having 500 points each with Db-4 level 1, 2 and 3 time 
series. As shown in Table ITTT1 the fitness value for the fits 
for these series are comparable and is maximum for level 
3 series. It may be noted that the original Aditya time 
series is rather coarsely measured and therefore we find 
it more appropriate to use Db-4 smoothened level 2 and 

3 series and not consider level 1 series. It also turns out 
that the GP solution for Db-4 level 1 Aditya time series 
is quite involved one and is also therefore of less interest 
to analyse further. 

We have therefore considered here the analysis for Db- 

4 level 2 and 3 Aditya time series. 

The best solutions found by GP, having fitness value 
of 0.9776 (for level=2) and 0.9834 (for level=3) Aditya 
time series are: 



vlevel—2 v 
-*t — A «l 



Xti — X t 2 



Y , x tl x t3+ x tl x t4- x t5 •,, Y 
A t2 ( Jf— ^){X t 3 + 



x tl+ x t2- x t4+ x t5 + 



3 x tl+ x t3- x tl x t3+ x tS 



X 2 

\^level—3 tl /-ia\ 

X t - ~ , 0.336X ?1 - ■ ( 10 ) 

A -t2 1 ; ».(i -Vt, 

t i x ~ 



tz 7.HX tl + 21UX 

"2(2.12X tl +0.11X.2 - <L^+ gggS ] ) 

tL 41 -"tl x t2 2.1X tl -Xf 1 +X t2 



We use the notation Xt m =Xt—m*T and show double 
precision numbers to only 3 significant digits for the sake 
of simplicity. 

Fig. [5] (a) and (b) show the fit obtained by the GP 
solutions (Eq. Il()[l for level=2 and level=3 resepctively. 
Both the fits are found to be quite good. 

Having obtained the map equations, it is interesting 
to see how well these solutions work in different regions 
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(a) 

FIG. 5: GP fit for 500 points of leveh 
Aditya time series data. 



(b) 



=2 (a) and level=3 (b) 



FIG. 6: GP predictions for 500 points of level=2 (a) and 
level=3 (b) Aditya time series data outside the fitted region 
of 500 points beginning at point 2000 each. 



of the time series out side the fitted region. We have 
selected such data sets at 7 different regions beginning at 
data point lying between 1000 and 7000. 

Fig. [S] show 1-step out of sample prediction for 500 
points beginning at point 2000 for level=2 series (a) and 
level=3 series (b). Table llVl shows the NMSE values for 
the above 7 regions in which the fit is tested. It can 
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TABLE IV: NMSE (Eq. for out-of-sample prediction for 
500 points of Aditya Tokamak time series at 7 different regions 
lying within data point 1000 to 7000. 

level = 2 level = 3 

Starting point NMSE Variance NMSE Variance 



1000 


0.034 


0.229 


0.019 


0.217 


2000 


0.040 


0.255 


0.017 


0.233 


3000 


0.056 


0.188 


0.032 


0.174 


4000 


0.066 


0.250 


0.025 


0.208 


5000 


0.058 


0.209 


0.031 


0.187 


6000 


0.041 


0.230 


0.026 


0.215 


7000 


0.031 


0.308 


0.023 


0.278 



TABLE V: Time lag r and dimension d for Db-4 transformed 
level 1, 2 and 3 NASDAQ time series. 

Time Lag r Dimension d 

Original NASDAQ series 2 4 

NASDAQ series (Db-4 level=l) 2 3 

NASDAQ series (Db-4 level=2) 3 4 

NASDAQ series (Db-4 level=3) 3 5 



be seen that the 1-step prediction is on the whole quite 
good. 

The Lyapunov exponents for the Db-4 level=2 and 
level=3 Aditya time series conisdered above are 0.276 
and 0.361 respectively. Using these Lyapunov exponent 



values, it is estimated that the calculated initial error 
of 0.01766 (in 501 st step) would grow in 10 steps to 
0.2117 for level=2 series and the calculated initial er- 
ror of 0.03779 (in 501 s * step) would grow in 10 steps to 
0.9737 for level=3 series. It is therefore expected that the 
dynamic predictions for these series would be in general 
difficult. However we have not made such predictions 
because they are not physically interesting. 



D. NASDAQ composite index time series 

The NASDAQ time series considered corresponds to 
daily closing values of the composite index for the dura- 
tion 2-Mar-1998 to 27-Mar-2002. We have divided each 
data value of NASDAQ series by a factor of 1000 for com- 
putational convenience. This series is first smoothened 
with Db-4 transform with level=l, 2 and 3. 

The time lag r and dimension d of the embedded vec- 
tors are shown in Table IVl 

In the following we present results for GP solutions 
for Db-4 level=2 and level=3 smoothened time series. A 
GP fit is separately made on the datasets of these two 
NASDAQ time series having 500 points each. 

The GP solutions for the above two smoothened time 
series are shown in Eq. lllfl where we use the notation 
X tm =X t - m *T and show double precision numbers to only 
3 significant digits for the sake of simplicity. 



x ievei=2 _ 5.5X^(X tl - 4.305) 



7.245^ - 31.193A tl + X tl X t2 - 4.305X t2 

D.179A ti 

0.81 ( lX £~ll X e t5 )+0.304 



X level=i = QM7X n + : "™ V " ^ (11) 

v ^ t; I i ni Y _i v ^t3- 5 - b ' I 0.72 

^.45^-0 "I- 1-Ul^. t2 -f 5 .6X t3 (X t2 -4X t 5 + 15.1)-3.7X t B + X t5 [3.287+- 



From Table EH we see that the fitness values for the 
two GP solutions are very good. It must be pointed out 
that we have enforced persistent solutions in the initial 
pool of chromosomes having fitness values 0.9924 and 
0.9962 for level 2 and 3 series respectively to obtain the 
GP solutions (Eq. lllfl . This feature of enforcing persis- 
tent solutions is explained in Appendix Sec. 2. 

Fig- U\ (a) and (b) show comparision between given 



TABLE VI: The fitness parameters for Db-4 smoothened 
level 2 and 3 NASDAQ time series obtained using GP fit on 
datasets of 500 points each. 









A 2 


Fitness 


NASDAQ series 


(Db-4 level= 


=2) 


1.02666 


0.9959 


NASDAQ series 


(Db-4 level= 


=3) 


0.324 


0.9986 



and GP calculated values of datasets of 500 points for 
level=2 and level=3 series respectively. The thick lines 
close to 0.0 in the figures indicate the difference between 
the given and calculated values and the small values for 
differences indicate that the fits are reasonable. The fit 
is better for level=3 series as compared to level=2 series. 

Next we carry out a one-step out-of-sample prediction 
beyond the fitted dataset of 500 points. 

Fig. |S| (a) and (b) show one-step prediction for 500 
points beginning at data point 501 for NASDAQ levcl=2 
and level=3 smoothened time series respectively. It can 
be seen that the one-step prediction is quite good. The 
NMSE values (Eq. [7J for the one-step prediction for 
100 points are 0.0593 and 0.0242 for level=2 and level=3 
smoothened time series resepctively. 

Next a dynamic prediction is made using the GP solu- 
tions. It may be noted that the first point (i.e. data point 
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(a) (b) 

FIG. 7: GP fit for datasets of 500 points for NASDAQ time 
series for level=2 (a) and level=3 (b). 



the figure, the predictions are not good . The dynamic 
prediction for 10 points gives NMSE value as 2.325 and 
3.102 for level=2 and level=3 series respectively. 

The Lyapunov exponents for Db-4 level=2 and level=3 
NASDAQ time series are 0.127 and 0.133 respectively. 
Using these Lyapunov exponents it is estimated that an 
initial error of 0.026 in 502 nd step (level=2) would grow 
in 10 steps to 0.093 and an initial error of 0.036 in 502 nd 
step (level=3) would grow in 10 steps to 0.14. In Fig. 
the error between calculated (dynamic) and given data 
values on the 10 th step are 0.078 (for level=2) and 0.21 
(for level=3). 
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(a) (b) 

FIG. 8: Out of sample prediction beyond the fitted dataset of 
500 points for (a) NASDAQ Db-4 level=2 time series (d=4, 
t=3) and (b) NASDAQ Db-4 level=3 time series (d=5, r=3) 

501) is not guaranteed to match exactly. However con- 
sidering the fact that the general dynamics of the time 
series would have been captured by GP, the calculated 
series is shifted such that the data point 501 is matched 
with the given value. This requires a marginal shifting 
for the two calculated series value by -0.0219 (for level=2 
series) and by -0.0073 (for level=3 series). 

Fig. (a) and (b) show comparison between actual 
and predicted values for NASDAQ level=2 and level=3 
smoothened series resepectively. As can be seen from 



V. CONCLUDING REMARKS 

Our findings with different data sets suggest that mod- 
eling deterministic time series using Genetic Program- 
ming is a very promising approach. This is particularly 
the case for data of real systems having complex dynam- 
ics after the statistical fluctuations are filtered out. The 
filtered series is to modeled by GP and the fluctuations by 
their statistical properties. Also it may be noted that as 
against other modeling approaches, GP does not require 
the calculation of any derivatives within the optimization 
procedure, and therefore it is well suited for the modeling 
of rapidly fluctuating time series. 

Our results for Logistic map and Rossler discretized 
system are quite impressive, giving good fitness values, 
good NMSE values, good 1-step predictions and above 
all good dynamic predictions. On the other hand the GP 
models for time series of real systems are satisfactory for 
giving good fitness values. 

As we have seen, a short coming of this method so far 
is that for real systems the iterative dynamic predictions 
are poor. In view of this we have developed (see Ap- 
pendix Sec. 3) an extension of GP to include fitting a 
sequence of values of the time series. This will enable us 
to capture patterns in the time series data. This work 
will be reported separately. 
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FIG. 9: Out-of-sample dynamic prediction beyond the fitted 
dataset of 500 points for (a) NASDAQ Db-4 level=2 time 
series (d=4, r=3) and (b) NASDAQ Db-4 level=3 time series 
(d=5, r=3) 



APPENDIX: IMPLEMENTATION OF GENETIC 
PROGRAMMING 



Here we give details of implementation and improve- 
ments that have been made in our work. 
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1. Introduction to Genetic Programming 

A Genetic Programming (GP) considers an ensemble of 
chromosomes, called the population, as the starting point 
and then processes it from one generation to the next. A 
given chromosome encodes a candidate solution of the 
optimization problem. The fitness of the chromosome 
is decided by an objective function that maps the chro- 
mosome structure to a fitness value. It is assumed that 
highly fit chromsomes are more likely to breed offsprings 
for the next generation. Genetic operators, namely copy, 
crossover and mutation are applied to generate a new en- 
semble of chromsomes. As a result of this evolutionary 
cycle of selection, crossover and mutation, more and more 
suitable chromsomes for the given optimization problem 
emerge within the population. It is at the discretion of 
the user to select the top (or one of the top most) chromo- 
some of the population at the end of a sufficient number 
of generations of the population. 



2. Implementation details of Genetic Programming 

It is required to address following issues while consid- 
ering GP as a means of solving an optimization problem: 

• Structure of chromosome: We have used a binary 
tree representation of the chromosome as described 
below. 

• An objective function for assessing the fitness of a 
chromosome: Eq. SJthat uses Eq. [3 and Eq. [31 is 

used as an objective function. 

• Values of parameters of the GP, such as the follow- 
ing: 

1. Population size: We have used 400 chromo- 
somes for 1-step fit. 

2. Number of generations: Maximum number of 
generations is set at 5000, however if the fit- 
ness of the top chromsome does not vary siz- 
ably for around a few hundred generations, 
then iterations are stopped. 

3. Probabilities for appying genetic operators: 
Whenever an operator needs to be chosen at 
random, we use 40% and 60% for selecting ei- 
ther a number (from -10.0 to 10.0 with the 
precision of one decimal) or a time lagged vari- 
able. For the mutation probability we have 
used 90% for enforcing mutation leaving aside 
top 10% chromosomes from getting mutated. 
This is referred to as elitism in GA literature 
that helps preserve a proportion of top most 
population in successive generations thereby 
not loosing whatever good that has been found 
so far. 



For the optimization problem being considered 
presently, the structure of a chromosome is an equation 
of the following form: 

A"t = /(A t _ T , A t _2r ! A t _3 T , ...Xt-dr ) 

where d is the embedding dimension and r is the sam- 
pling time. The aim of Genetic Programming is to find 
out the best possible functional form of f that gives rise 
to the map of the given time series in terms of the pre- 
vious time lagged components. Once the map equation 
is available by fitting a given set of data points, it can 
be used for predicting the future state of the system, i.e. 
generate out-of-sample time series data. 

The structure of chromosome is in the form of an al- 
gebraic expression involving binary operators, numbers 
and time lagged variables of the form A t _ T . We have 
used a binary tree representation using non-linear dy- 
namic data structures for the structural representation 
of chromosomes. 

The binary tree structure has several advantages over 
a linear structure of characters: 

• Brackets are not required to be stored explicitly in 
the binary representation. 

• The genetic operations on the chromosomes, are 
considerably simplified as only pointers need to be 
manipulated while carrying out the crossover oper- 
ation. 

• Since binary tree is grown using a dynamic data 
structures, it is not required to specify an upper 
limit for number of tokens in the expression tree. 
This eliminates the need for boundary level check- 
ing for the overflow of the size of expression tree be- 
yond the specified limit. In such a case of overflow 
of expression size, one normally brings back the ex- 
pression to a template structure thereby loosing the 
structural information found so far. 

• It has been found that many of the GP solutions 
turn out to be in the Pade form. It is quite straight 
forward to check whether a given GP solution is in 
the Pade form or not. This is done just by inspect- 
ing the operator in the root node of the tree and 
confirming whether it is division operator or not. 
In the same way, it is quite straight forward to op- 
tionally impose a Pade form for GP solution. 

• Evaluation of the expression in binary tree repre- 
sentation is considerably simpler (each leaf sub-tree 
is evaluated and collapsed to a numerical value us- 
ing a recursive procedure till the whole tree finally 
reduces to a number). In contrast, if an expression 
has a linear structure (e.g. an array), then multiple 
passes of array are required to give due credence to 
the hierarchy of operators. 

It is observed that for a non-stationary time series 
(e.g. NASDAQ series), a persistent solution X t —X t -i, 
although not the best possible solution, usually gives 
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quite a good fitness value close to 1.0. Further it is ob- 
served that if during GP iterations such persistent solu- 
tions are found, they tend to dominate the population 
thereby giving rise to convergence to such trivial solu- 
tions. Since these trivial solutions are of no interest from 
the viewpoint of underlying dynamics of the system, it is 
essential to get the GP away from such trivial solutions. 
It is interesting to understand the reason why the ma- 
jority of the population tends to get flooded by trivial 
solutions. This is because the tree for such solutions is 
a single node tree and so crossover cannot lead to new 
solutions. Mutation would also not help in such situa- 
tions because GP settles down to a single node solution 
corresponding to the best possible time lagged variable 
(say Xt-i), and changing this variable to another time 
lagged variable leads to decrease in fitness value. Thus 
GP optimization is virtually helpless i.e. is left without 
any means to come out of the local minima arising out 
of persistent solution. 

In order to resolve this problem, we use various forms 
of multipliers to the GP chromosome. The multiplier can 
be a constant number C or possibly a Gaussian curve 
passing through the data set of points to be fitted. In 
case of a constant multiplier, the trivial solution is in 
the form of Xt-i/C and it is easy to see that GP can 
now possibly vary and grow this structure by the genetic 
crossover and mutation operators to optimize the fitness 
value. In fact, we have been able to transform the prob- 
lem (arising out of convergence to local minima due to 
such trivial solutions) to our advantage by starting the 
GP iteration from the reasonably higher fitness values of 
trivial solution. On the other hand if we start with a 
purely random population, we have usually found that 
the top most chromosome in the population has a much 



lower value of fitness compared to that of a trivial solu- 
tion and it has to undergo a substantially large number 
of iterations to reach to this fitness value if not surpass 
it. 



3. Multi-step GP fit 

The usual method adopted for prediction is to first fit a 
given data set of points and then make out-of-sample pre- 
diction using either 1-step or multi-step (dynamic) pre- 
diction. The 1-step predictions obtained by GP fit are 
usually found to be good. The real test of any dynamic 
model however lies in whether it is able to make out-of- 
sample dynamic predictions outside the range of data set 
used for fitting by GP. It is observed that the dynamic 
prediction is either not good or at the most it gives good 
predictions only for a very few number of points. In a 
way this is understandable because our fit itself is of 1- 
step nature. The number of equations to be fitted by 
GP during an iteration are all trying to fit the next point 
using a map involving its time lagged variables occur- 
ring in immediate past. Thus GP is not trained to make 
multiple time step predictions. We have therefore also 
considered a multi-step fit by GP. We require GP to fit 
not just the next step X t (of Eq. [TJ, but also X t +\, X t +2, 
... X t +rn, where m is a predefined number of steps. Thus 
for a given equation to be fitted, the sum of squared er- 
rors invloves not only [Xf alc — xf lven ] 2 , but in addition 

[X calc _ X?lT] ^ up tQ [Xf „U _ xflZ n ]2 _ However 

it may be noted that multi-step GP fit leads to a very 
involved computation and we are currently able to carry 
out such GP fit for a limited population size only. 
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